Modelling Techniques for Twitter Contents: A Step beyond Classification based Approaches
نویسندگان
چکیده
In this paper we present our first participation at RepLab Campaign. Our work is focused in two contributions. The first one is the use of an IR method to address Polarity and Filtering tasks. These two tasks can be seen as the same problem: to find the most relevant class to annotate a given tweet. For that, we applied a classical IR approach, using the tweet content as query against an index with the models of the classes used to annotate tweets. To model these classes we propose the use of the Kullback Leibler Divergence (KLD), in order to extract their most representative terminology. Different data and ways to model these data (through KLD) are also proposed. The second contribution is related to the Topic Detection task. Instead a clustering based technique; we propose the application of Formal Concept Analysis (FCA) to represent the contents in a lattice structure. To extract topics from the lattice, we applied a FCA concept: stability. According to the results, our IR based approach has been proven as very satisfactory for the Polarity task, while for the Filtering task, it seems to be less suitable. On the other hand FCA modelling has been demonstrated as a promising methodology for Topic Detection, achieving high successful results.
منابع مشابه
A High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملA Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks
The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...
متن کاملDetecting Social Spam Campaigns on Twitter
The popularity of Twitter greatly depends on the quality and integrity of contents contributed by users. Unfortunately, Twitter has attracted spammers to post spam content which pollutes the community. Social spamming is more successful than traditional methods such as email spamming by using social relationship between users. Detecting spam is the first and very critical step in the battle of ...
متن کاملTwitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination
This paper presents a step-by-step methodology for Twitter sentiment analysis. Two approaches are tested to measure variations in the public opinion about retail brands. The first, a lexicon-based method, uses a dictionary of words with assigned to them semantic scores to calculate a final polarity of a tweet, and incorporates part of speech tagging. The second, machine learning approach, tackl...
متن کاملNumerical modelling of the underground roadways in coal mines– uncertainties caused by use of empirical-based downgrading methods and in situ stresses
Numerical modelling techniques are not new for mining industry and civil engineering projects anymore. These techniques have been widely used for rock engineering problems such as stability analysis and support design of roadways and tunnels, caving and subsidence prediction, and stability analysis of rock slopes. Despite the significant advancement in the computational mechanics and availabili...
متن کامل